Recognizing Chemical Entities in Biomedical Literature using Conditional Random Fields and Structured Support Vector Machines

نویسندگان

  • Buzhou Tang
  • Xiaolong Wang
  • Yonghui Wu
  • Min Jiang
  • Jingqi Wang
  • Hua Xu
چکیده

The Spanish National Cancer Research Center (CNIO) and University of Navarra organized a challenge on recognizing chemical compounds and drugs (chemical entities) in biomedical literature, which includes two individual subtasks: 1) chemical entity mention recognition (CEM); and 2) chemical document indexing (CDI). The challenge organizers manually annotated chemical entities in 10000 abstracts from PubMed, of which 3500 abstracts were used as a training set, 3500 abstracts as a development set, and 3000 abstracts as a test set. We participated in subtask 1 and developed a machine learning-based system using two state-of-the-art sequence labeling algorithms: Conditional Random Fields (CRF) and Structured Support Vector Machines (SSVM). Our best model built on the training set achieved the highest F-measure of 0.81862 for CEM on the development set.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A comparison of conditional random fields and structured support vector machines for chemical entity recognition in biomedical literature

BACKGROUND Chemical compounds and drugs (together called chemical entities) embedded in scientific articles are crucial for many information extraction tasks in the biomedical domain. However, only a very limited number of chemical entity recognition systems are publically available, probably due to the lack of large manually annotated corpora. To accelerate the development of chemical entity r...

متن کامل

CD-REST: a system for extracting chemical-induced disease relation in literature

Mining chemical-induced disease relations embedded in the vast biomedical literature could facilitate a wide range of computational biomedical applications, such as pharmacovigilance. The BioCreative V organized a Chemical Disease Relation (CDR) Track regarding chemical-induced disease relation extraction from biomedical literature in 2015. We participated in all subtasks of this challenge. In ...

متن کامل

Efficient Structured Prediction with Latent Variables for General Graphical Models

In this paper we propose a unified framework for structured prediction with latent variables which includes hidden conditional random fields and latent structured support vector machines as special cases. We describe a local entropy approximation for this general formulation using duality, and derive an efficient message passing algorithm that is guaranteed to converge. We demonstrate its effec...

متن کامل

Disease named entity recognition by combining conditional random fields and bidirectional recurrent neural networks

The recognition of disease and chemical named entities in scientific articles is a very important subtask in information extraction in the biomedical domain. Due to the diversity and complexity of disease names, the recognition of named entities of diseases is rather tougher than those of chemical names. Although there are some remarkable chemical named entity recognition systems available onli...

متن کامل

UTH-CCB@BioCreative V CDR Task: Identifying Chemical-induced Disease Relations in Biomedical Text

This paper describes the system developed by the UTH-CCB team from the University of Texas Health Science Center at Houston (UTHealth), for the 2015 BioCreative V shared tasks of Track 3 on extraction of chemical disease relation (CDR). We participated in both tasks: Task A for “Disease Named Entity Recognition and Normalization (DNER)” and Task B for “Chemical-induced Diseases Relation Extract...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013